A Fault – Tolerant System for Balancing the Load of Data – Parallel Applications

نویسندگان

  • Samuel H. Russ
  • Ioana Banicescu
  • Mark L. Bilderback
  • Sheikh K. Ghafoor
چکیده

Abstract –– In distributed computing environments, fault–tolerance is an important objective, especially for parallel applications. Many distributed computing environments achieve fault–tolerance by periodic checkpointing. This has the advantage of relative ease of implementation and can be considered equivalent to task migration. However, there are two main disadvantages of such environments. One is that any work in progress after checkpointing is lost when a fault occurs. The other is that these systems are heavily reliant on task migration as the only mechanism for load balancing. This paper presents a system that overcomes these shortcomings by task duplication and by the integration of data migration into task migration as a load balancing mechanism. It also presents results of a preliminary implementation.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Parleda: a Library for Parallel Processing in Computational Geometry Applications

ParLeda is a software library that provides the basic primitives needed for parallel implementation of computational geometry applications. It can also be used in implementing a parallel application that uses geometric data structures. The parallel model that we use is based on a new heterogeneous parallel model named HBSP, which is based on BSP and is introduced here. ParLeda uses two main lib...

متن کامل

Fault Tolerant Scheduling for Parallel Loops on Shared Memory Systems

While multicore/multiprocessor systems achieve significant speedup for many applications by exploiting loop level parallelism, they also suffer from increased reliability problems as a result of ever scaling device size. This paper addresses the reliability of loop dominated applications, aiming to execute parallel loops efficiently in the presence of various types of hardware faults. In this p...

متن کامل

Application Recovery in Parallel Programming Environment

In this paper, fault-tolerant feature of TOPAS parallel programming environment for distributed systems is presented. TOPAS automatically analyzes data dependence among tasks and synchronizes data, which reduces the time needed for parallel program developments. TOPAS also provides supports for scheduling, load balancing and fault tolerance. The main topics of this paper is to present the solut...

متن کامل

Fault-Tolerant Parallel Programming with Atomic Actions

The Pact (parallel actions) parallel programming environment provides an easy-to-use parallel execution and synchronization model based on task parallelization. To give the programmer an abstraction for global data (even on distributed memory machines) the Pact runtime system uses virtual shared memory. Execution’s efficiency is improved with data-dependent dynamic load balancing and latency-ma...

متن کامل

NASA Contractor Report 181938 Investigation of the Applicability of a Functional Programming Model to Fault Tolerant Parallel Processing for Knowledge-Based Systems

In a fault-tolerant parallel computer, a functional programming model can facilitate distributed checlq3ointing, error recovery, load balancing, and graceful degradation. Such a model has been implemented on the Draper Fault Tolerant Parallel Processor (FTPP). When used in conjunction with the FrPP's fault detection and masking capabilities, this implementation results in a graceful degradation...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007